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1 .  Introduction 


There  is  a  great  deal  of  interest  in  the  U.S.  military  in  monitoring  the  health  of  assets  in  operation 
in  the  field.  The  primary  motivation  is  so  that  timely,  efficient,  and  effective  decision  making  can 
be  made  both  for  operations  and  logistical  support.  With  this  in  mind,  the  U.S.  Army  Research 
Laboratory  (ARL)  has  teamed  with  the  U.S.  Army  Tank  and  Automotive  Research, 

Development  and  Engineering  Center  (TARDEC)  to  investigate  approaches  for  assessing  the 
health  of  diesel  engines  (Technology  Program  Annex  TA-SE-2010-5).  Seeded  fault  testing  was 
executed  with  the  assistance  of  Millennium  Integrated  Services  (MIS)  2000/Global  Defense 
under  ARL  contract  W91  INF-09-2-0036.  The  focus  of  this  report  is  to  present  progress  in  this 
area,  and  specifically,  to  review  the  efficacy  of  algorithms  that  can  detect  anomalous  conditions 
intentionally  imposed  on  the  system  (seeded  faults)  and  then  identify  the  source  of  the  variation 
that  caused  the  anomaly.  It  is  also  anticipated  that  the  effort  in  this  particular  subject  will  be 
applicable  to  other  areas  of  interest  in  ARL’s  prognostics  and  diagnostics  (P&D)  program. 


2.  Experimental 


A  military  version  of  the  CAT  7  diesel  engine  (Model  C7  DITA)  was  installed  and  instrumented 
in  a  dynamometer  (dyno)  test  cell  at  TARDEC’s  facilities  in  Warren,  MI  (figure  1).  The  basics  of 
the  setup  and  data  collected  are  described  here;  for  a  detailed  description  of  the  experiment,  see 
reference  1 .  The  setup  was  designed  so  that  the  engine  could  be  operated  and  controlled  without 
the  presence  a  vehicle.  The  test  stand  supported  provision  of  fuel,  coolant,  inlet  air,  and 
exhausting  of  the  engine  as  well  as  a  load  (eddy  current  dyno,  computer  controlled).  Data  were 
collected  from  a  variety  of  sources  including  existing  sensors  on  the  engine  through  the 
controller-area  network  (CAN)  vehicle  bus  standard,  several  sensors  in  the  test  cell  recorded  by 
the  cell  data  acquisition  system  (DAQ),  and  a  few  “add-on”  sensors  that  were  recorded  at  a 
higher  rate  (referred  to  as  “analog  data”).  A  small  portion  of  the  data  that  is  referred  to  as  “digital 
data”  is  primarily  used  for  timing.  There  were  also  sensors  inserted  and  data  collected  by  the 
Pennsylvania  State  University  (Penn  State)  Applied  Research  Laboratory.  Both  the  CAN  and 
dyno  data  were  collected  at  a  relatively  low  rate  and  provided  to  ARL  at  1  Sample/s,  and  could 
be  monitored  continuously  during  a  test  run.  The  analog  data  were  collected  at  10  kiloSamples/s 
and,  due  to  the  high  rate,  “snapshots”  of  data  of  between  1  and  30  s  were  collected  at  select  times 
during  a  test  run.  The  Penn  State  data  were  collected  independently  without  time  synchronization 
of  the  TARDEC  data  at  102.4  kiloSamples/s.  A  diagram  of  the  engine  control,  instrumentation, 
and  data  flow  is  shown  in  figure  2. 
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Figure  1.  Instrumented  CAT  7  engine  in  the  TARDEC  test  cell. 
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Figure  2.  Engine  control,  instrumentation,  and  data  flow. 
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Test  runs  were  perfonned  with  various  seeded  faults  and  no  fault  cases.  A  test  run  consisted  of 
running  the  engine  through  a  stepwise  sequence  of  designated  speeds  for  a  short  time  at  each 
speed,  as  shown  in  figure  3;  all  with  either  no  fault  or  a  particular  seeded  fault.  The  engine 
speeds  with  associated  duration  were  duplicated  for  all  the  tests.  As  can  be  seen,  there  are  six 
speeds  with  duration  of  between  1  and  3  min  each;  the  time  duration  at  a  given  speed  set  point 
was  not  precisely  controlled. 


Figure  3.  Typical  stepped  control  of  engine  speed  for  a  performance  run. 


3.  Data  for  Analysis 


The  current  focus  is  on  the  performance  test  data  since  these  files  have  several  baseline  runs 
along  with  several  seeded  fault  runs.  Baseline  runs  are  test  sequences  at  the  beginning  of  a  test 
day  in  which  there  was  no  fault  but  the  standard  test  sequence  was  followed,  and  as  such  are 
viewed  as  “healthy  states.”  Table  1  shows  the  15  baseline  runs  that  were  identified.  For  principal 
component  analysis  (PCA)  and  autoassociative  neural  network  based  methods  (AANN),  training 
data  is  required;  the  first  column  of  table  1  shows  the  runs  that  were  selected  for  training  (50%  of 
the  runs,  using  a  random  number  generator).  Table  2  shows  the  33  seeded  fault  perfonnance 
runs;  however,  three  of  these  test  runs  are  considered  a  baseline  condition  since  their  gain  was 
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set  to  1.0,  which  is  the  nominal  value.  As  a  note,  several  files  contained  more  than  one  run, 
where  the  additional  runs  were  various  levels  of  the  same  fault  type. 

Table  1.  Baseline  performance  runs. 


Baseline  Performance  Test# 

Date 

MatLAB  File  Name 

Run  #  in  File 

Train  (0)  or  Test  (1) 

Training  1 

May  27,  2011 

PerfM3 JP8 May27 ext 

1 

0 

Training  2 

May  27,  2011 

PerfM3  JP8  May27  ext 

2 

0 

Test  1 

June  1,  2011 

Pert  Junl  ext 

1 

1 

Training  3 

June  3,  2011 

Perfor  Jun3  ext 

1 

0 

Training  4 

June  8,  2011 

Perfor  Jun8  par 

1 

0 

Test  2 

June  10,  2011 

Perfor  JunlO  ext 

1 

1 

Training  5 

June  15,  2011 

Perfor  Junl 5  ext 

1 

0 

Test  3 

June  16,  2011 

Perfor  Junl 6  ext 

1 

1 

Test  4 

June  22,  2011 

Perfor  C  Jun22  ext 

1 

1 

Test  5 

June  29,  2011 

Perfor  jun29  ext 

1 

1 

Test  6 

July  1,  2011 

Perf  Jull  ext 

1 

1 

Training  6 

July  6,  2011 

Perfor  Jul6  ext 

1 

0 

Training  7 

July  8,  2011 

Perfor  Jul8  ext 

1 

0 

Test  7 

July  27,  2011 

Perfor  Jul27  ext 

1 

1 

Test  8 

August  3,  2011 

Perfor _ext3_ext 

1 

1 

Table  2.  Seeded  fault  performance  runs. 


Test# 

Date 

MatLAB  File  Name 

Fault  Type 

Run  in  File 

Severity 

9 

May  27,  2011 

PerfM3  IntRestr  May27  ext 

IntakeAir  Restric  Test 

1 

Pos  #  4 

10 

May  27,  2011 

PerfM3  IntRestr  May27  ext 

IntakeAir  Restric  Test 

2 

Pos  #  6 

11 

June  8,  2011 

PerfM3  OilP  Jun8  par 

OilPress  High  Gain 

1 

Gain  1.0 

12 

June  8,  2011 

PerfM3  OilP  Jun8  par 

OilPress  High  Gain 

2 

Gain  0.7 

13 

June  8,  2011 

PerfM3  OilP  Jun8  par 

OilPress  High  Gain 

3 

Gain  1.3 

14 

June  10,  2011 

PerfM3  AirChgT  JunlO  ext 

Air  Charge  Temperature  Increase 

1 

Increased  by  20°F 

15 

June  10,  2011 

PerfM3  AirChgT  JunlO  ext 

Air  Charge  Temperature  Increase 

2 

Increased  by  30°F 

16 

June  10,  2011 

PerfM3  AirChgT  JunlO  ext 

Air  Charge  Temperature  Increase 

3 

Increased  by  50°F 

17 

June  15,  2011 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

1 

Pos  #2 

18 

June  15,  2011 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

2 

Pos  #3 

19 

June  15,  2011 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

3 

Pos  #  4 

20 

June  15,  2011 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

1 

Pos  #5 

21 

June  15,  2011 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

2 

Pos  #6 

22 

June  15,  2011 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

1 

23 

June  15,  2011 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

2 

24 

June  16,  2011 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

1 

25 

June  16,  2011 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

2 

26 

June  16,  2011 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

3 

27 

June  29,  2011 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

1 

Pos  #5 

28 

June  29,  2011 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

2 

Pos  #6 

29 

June  29,  2011 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

3 

Pos  #7 

30 

July  6,  2011 

PerforM3  B  BoostG  Jul6  ext 

Boost 

1 

Gain  0.85 

31 

July  6,  2011 

PerforM3  B  BoostG  Jul6  ext 

Boost 

2 

Gain  0.95 

32 

July  6,  2011 

PerforM3  B  BoostG  Jul6  ext 

Boost 

3 

Gain  1.00 

33 

July  13,  2011 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

1 

60% 

34 

July  13,  2011 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

2 

55% 

35 

July  13,  2011 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

3 

50% 

36 

July  13,  2011 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

1 

42% 

37 

July  13,  2011 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

2 

46% 

38 

July  13,  2011 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

3 

50% 

39 

August  3,  2011 

PerforM3  InjPresG  ext3  ext 

InjPress 

1 

Gain  1.0 

40 

August  4,  2011 

PerforM3  InjPresG  ext3  ext 

InjPress 

2 

Gain  0.9 

41 

August  5,  2011 

PerforM3  InjPresG  ext3  ext 

InjPress 

3 

Gain  1.1 
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From  the  data  described,  it  was  determined  to  initially  work  with  the  45  signals  from  the  CAN 
and  dyno.  Working  with  this  set  of  low-cost  sensor  and  CAN  bus  signals  provides  a  path  for  a 
practical  onboard  implementation,  and  thus,  is  conducted  first  prior  to  considering  vibration  and 
other  signals  for  developing  health  models.  They  also  have  been  interpolated  and  aligned  to  the 
same  1  Sample/s  acquisition  rate,  so  were  in  a  format  that  was  ready  to  process.  The  CAN  and 
dyno  signals  are  identified  in  table  3.  The  32  signals  highlighted  in  orange  were  used  in  the 
analysis.  The  other  13  signals  were  not  included  because  they  are  either  operating  conditions  or 
have  a  low  amount  of  variability. 

Table  3.  Signals  recorded  from  CAN  and  dyno. 


Siqnal  # 

Sensor  Name 

16 

Fuel  Flow 

16 

Speed 

17 

Torque 

18 

Throttle  Pos 

19 

Lambda 

20 

AirFlow 

21 

BB-Torque-Sen 

22 

T-IntAirMani 

23 

T-aftCompr 

24 

CoolAftEnqine 

25 

T-ExhB4Turbo1 

26 

T-ExhB4Turbo2 

27 

T-ExhStack 

28 

P-AirB4Mani 

29 

P-aftTurbo 

30 

P-ExhB4Turbo1 

31 

P-ExhB4Turbo2 

32 

'P-Exh  Stack’ 

Signal  # 

Sensor  Name 

1 

Time 

2 

EngSp 

3 

Load% 

4 

EngOilP 

5 

Boost 

6 

InjCtrtP 

7 

EngCooTT 

8 

IntManlAirT 

9 

Pedal% 

10 

ElPot 

11 

Fuel  Rate 

12 

DesEngSp 

13 

NomFric% 

14 

Load@Sp 

Siqnal  # 

Sensor  Name 

33 

T-OilGallev 

34 

P-OilGallev 

35 

ECM1 -Boost 

36 

Sensor-Boost 

37 

ECMI-InjPres 

38 

Sensor-lniPres 

39 

ECMI-OilPres 

40 

Sensor-OilPres 

41 

ECMI-EnqCoolT 

42 

Sensor-EngCoolT 

43 

ECMI-AirlntMani 

44 

Sensor-AirlntMani 

45 

Event 

4.  Modeling  Approaches 


Modeling  approaches  identified  for  the  CAT  7  data,  in  order  of  complexity,  included  single 
parameter  monitoring,  correlation  analysis,  PCA  monitoring  methods,  and  AANN  residual 
methods.  Each  of  these  methods  has  its  advantages  and  disadvantages;  table  4  lists  each  method 
along  with  important  trade-offs.  For  the  present  study,  only  correlation  analysis  and  PCA 
monitoring  were  undertaken.  Single  parameter  modeling  is  very  simple  and  can  be  done  in  real 
time;  however,  it  requires  extensive  experience  for  setting  thresholds  for  each  variable,  thus  only 
an  example  of  its  application  is  presented  at  this  time.  The  AANN  method  is  well  suited  for  this 
data  and  is  being  considered  for  future  work. 
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Table  1.  Advantages  and  disadvantages  for  each  modeling  approach  identified  for  the  CAT  7  data. 


Method 

Training 

Requirements 

Threshold  Setting 

Positive/N  egative 

Single  parameter 
monitoring 

None 

Experience  or 
requires  historical 
data 

Simple  but  does  not  take  into 
account  relationship  among 
variables 

Correlation  analysis 

Multiple  baseline  data 

sets 

Experience  or 
requires  historical 
data 

Takes  into  account  variable 
interaction  but  a  less 
established  method  than  PCA 

PCA  monitoring 

Multiple  baseline  data 

sets 

Established 
statistical  limits 

Well-established  method  but 
does  not  account  for  nonlinear 
interaction  between  variables 

AANN  method 

Multiple  baseline  data 
sets  and  more 
computation 

Statistical  limits 

Handles  nonlinear  variable 
interaction  but  requires  more 
computation  for  training 

4.1  Single  Parameter  Monitoring 

Single  parameter  monitoring  assumes  that  degradation  in  engine  performance  can  be  evaluated 
by  one  or  more  of  the  signals  independently.  It  appears  likely  that  this  method  can  be  applied  to 
this  data,  but  as  mentioned,  fault  thresholds  for  a  signal  must  be  expertly  set.  Here  we  present 
only  an  example  of  how  this  method  could  be  applied.  First,  it  is  noted  that  when  the  exhaust  was 
restricted,  the  exhaust  gas  temperature  is  seen  to  be  increased  above  baseline  runs,  as  shown  in 
figure  4.  If  it  was  known  that  an  exhaust  stack  temperature  above  1 100  °F  at  an  operating  speed 
of  1450  RPM  indicated  that  there  was  a  blockage  in  the  exhaust  stack,  then  the  faulted  condition 
could  be  identified.  Note  that  our  example  does  not  take  into  account  what  other  variables  might 
cause  the  exhaust  stack  temperature  to  increase. 
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Figure  4.  Exhaust  stack  temperature  for  an  exhaust  restriction  of  50%  compared  with  baseline  data. 


4.2  Correlation  Analysis 

The  main  concept  of  this  approach  is  to  look  for  correlation  changes  with  respect  to  a  template 
file.  The  seven  baseline  runs  identified  for  training  in  table  1  were  used  as  the  template.  The  list 
of  processing  steps  follows: 

1 .  Select  the  regime  and  signal  subset. 

2.  Perform  the  correlation  matrix  calculation  for  the  baseline/template. 

3.  Perform  the  correlation  matrix  calculation  for  the  test  run. 

4.  Calculate  the  correlation  difference  matrix  with  respect  to  the  template  file. 

5.  Calculate  a  figure  of  merit  (FOM)  for  the  test  run. 

6.  Health  classification  using  a  FOM  threshold  from  the  receiver  operating  characteristic 
curve  (ROC  curve) 

The  initial  step  includes  the  option  of  considering  a  particular  operating  regime  or  signal  subset. 
A  FOM  value  is  calculated  and  provides  a  single  indicator  that  can  be  used  to  assess  the  health 
condition  of  the  engine.  Additionally,  searching  the  correlation  difference  matrix  for  maximum 
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changes  provides  a  way  of  identifying  which  signals  are  contributing  to  the  anomalous 
perfonnance. 

1 .  Signal  Subset  and  Operating  Regime 

The  correlation  analysis  focused  on  the  32  signals  highlighted  in  table  3.  Only  data  that 
were  in  the  operating  regime  with  engine  speed  above  1500  RPM  and  engine  load  above 
80%  were  considered;  this  included  all  of  the  standard  run  data,  but  cuts  out  deviations 
from  the  standard  run  that  are  in  the  actual  data. 


2.  Correlation  and  Correlation  Difference  Matrix 


Correlation  between  two  signals,  S;  and  Sj,  is  defined  as  the  covariance  between  those  two 
signals  nonnalized  by  the  variance  of  each  signal  ssiandSj,  as  shown  in  equation  1. 


r ..  = 

v 


Cov(si,s  ) 


G  G 

S :  S  : 


(1) 


where  covariance  is  defined  as  the  expected  value  expression  in  the  numerator  (equation  2). 
Note  that  the  correlation  is  calculated  for  each  signal  pair  and  provides  a  matrix  that  is 
N  x  N  in  size,  where  N  is  the  number  of  signals  (2). 


rij  = 


(2) 


The  correlation  difference  matrix  is  generated  by  subtracting  the  elements  of  the  correlation 
matrix  for  the  run  data  from  the  template  and  squaring  it  to  produce  a  magnitude  (equation 
3). 


Run  -  n  _  Template )“ 


(3) 


Correlation  difference  matrix  plots  are  shown  in  figures  5  and  6.  Figure  5  shows  difference 
plots  for  two  of  the  baseline  runs  (healthy)  and  indicates  a  low  level  of  variation  in  healthy 
sets.  Figure  6  is  for  an  exhaust  restriction  run  and  shows  distinct  differences  in  correlation 
from  the  healthy  template,  particularly,  in  the  T-ExhStack  sensor  compared  with  almost 
every  other  sensor. 
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Correlation  Difference  Matrix  Plot  for  File  #3  (Baseline  Test  Junl) 


Signal  # 

Correlation  Difference  Matrix  Plot  for  File  #6  (Baseline  Test  JunlO) 


Signal  # 


Figure  5.  Correlation  difference  matrices  for  baseline  runs  (left)  3  and  (right)  6. 
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Figure  6.  Correlation  difference  matrix  for  a  46%  exhaust  restriction  (run  36). 

FOM  Calculation  and  Health  Assessment 

For  each  test  run,  a  FOM  value  based  on  the  correlation  difference  matrix  was  calculated.  The 
FOM  was  defined  as  the  summation  of  the  values  of  the  correlation  difference  matrix  (equation 
4): 

FOM=fjfjd,J  (4) 

i= 1  J= 1 

To  evaluate  health  of  the  system  based  on  this  FOM,  there  needs  to  be  a  threshold  established 
above  which  the  engine  will  be  considered  to  be  in  a  faulted  state.  The  receiver  operating 
characteristic  curve  (ROC  curve)  is  a  common  way  of  showing  classification/detection  results  as 
a  function  of  false  positives  and  false  negatives  as  a  threshold  is  varied  (3).  Figure  7  shows  the 
ROC  curve  for  the  FOMs  of  this  data.  In  this  case,  a  threshold  of  approximately  44.6  for  the 
FOM  value  offers  the  best  trade-off.  This  provides  a  false  alarm  rate  of  5.56%  and  a  missed 
detection  of  46.7%. 
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Figure  7.  ROC  curve  for  the  correlation-based  FOM. 

The  FOM  health  values  are  shown  along  with  the  threshold  in  figure  8.  For  the  baseline  runs,  10 
of  the  1 1  runs  were  classified  correctly  as  healthy  (9.1%  false  alarm  rate).  Note  that  three 
additional  baseline  runs  were  added  and  given  a  later  test  number  since  these  are  from  the  files  in 
which  the  gain  was  varied,  but  for  these  runs  the  gain  was  1.0.  For  the  seeded  fault  runs,  14  of 
the  30  were  misclassified  as  healthy.  The  missed  detection  rate  is  quite  high  and  highlights  that 
this  method  has  difficulty  in  detecting  lower  levels  of  degradation  (associated  with  the  lower 
levels  of  particular  faults).  There  is  some  implication,  however,  that  some  lower  fault  levels  may 
not  degrade  engine  performance,  and  it  may  not  be  correct  to  call  them  “unhealthy.”  In  general, 
this  method  is  detecting  the  more  severe  induced  faults  but  not  the  lower  levels  of  the  same  fault. 
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Figure  of  Merit  Correlation  Health  Value 


Figure  8.  Correlation  FOM  for  all  runs  The  first  eight  are  baseline  runs  and  the  remainder  are  by  test 
number  from  table  2.  The  plot  excludes  training  runs. 

4.3  Principal  Component  Analysis 

The  primary  concept  is  to  extract  useful  information  from  the  data  set  by  projecting  the  data  into 
a  new  set  of  orthogonal  coordinates.  PCA  does  this  by  performing  an  eigenvalue/eigenvector 
calculation  on  the  covariance  matrix  ( 4 ).  Its  use  for  data  analysis  is  diverse;  for  health 
monitoring,  the  application  here,  its  use  for  dimension  reduction  (5)  is  applied  and  it  is  also  used 
to  calculate  monitoring  statistics  (6).  Specifically,  the  statistics  T“  and  square  prediction  error 
(SPE)  are  calculated  for  the  block  of  data  in  each  operating  regime.  The  mean  of  the  health  value 
in  that  block  is  used  to  decide  on  the  health  status  based  on  thresholds  derived  from  statistical 
theory.  If  any  values  are  above  those  thresholds,  contribution  plots  are  used  to  further  identify 
the  source  of  the  fault.  Listed  below  are  the  steps  that  were  followed.  Details  of  the  basic  PCA 
calculations  are  omitted,  and  the  reader  is  referred  to  references  4-8  for  specifics  on  the  use  of 
PCA  and  fonnulae  used: 

1 .  Select  the  regime  and  signal  subset 

2.  Normalize  the  data  and  calculate  the  covariance  matrix  for  the  training  set. 

3.  Perform  eigenvalue/eigenvector  calculation  of  the  covariance  matrix. 

4.  Save  PCA  baseline  models  with  normalization  and  eigenvalue/eigenvector  information. 

5.  Normalize  and  project  data  from  the  monitored  engine  using  baseline  models. 
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f.  Calculate  the  T  and  SPE  health  statistics  and  calculate  the  mean  of  these  for  the  block  of 
data. 

g.  Calculate  the  top  contributors  for  each  fault. 

The  initial  step  is  to  select  operating  regimes  and  the  signal  list.  The  signals  were  the  same  as 
those  used  for  the  correlation  analysis.  Four  regimes  were  selected,  which  represent  steady  state 
operating  points  in  the  performance  test  runs,  as  shown  in  table  5.  To  avoid  transient  effects,  the 
first  and  last  20  s  in  a  particular  operating  regime  were  not  included  in  the  calculations. 


Table  2.  Operating  regimes  for  PCA  analysis. 


Regime  No 

Engine  RPM 

Engine  Load 

Pedal  % 

1 

1620-1820 

60-100 

80-100 

2 

1820-2020 

60-100 

80-100 

3 

2020-2200 

60-100 

80-100 

4 

2220-2420 

60-100 

80-100 

After  calculating  the  principal  components,  its  is  seen  that  the  first  few  principal  components  can 
explain  most  of  the  variation  seen  in  the  data.  The  typical  approach  for  determining  the  number 
of  principal  components  to  retain  is  to  look  at  the  eigenvalues  (ranked  in  decreasing  order)  and 
select  the  ones  that  explain  a  high  percentage  of  the  variability  in  the  data.  In  this  analysis,  the 
percentage  was  set  to  85%.  As  an  example,  in  the  case  of  Regime  4,  the  top  principal  component 
accounts  for  37%  of  the  variability  in  the  data  set  and  the  first  eight  account  for  85%.  Figure  9 
shows  the  decay  in  the  variability  for  the  first  10  signals;  the  variability  continues  to  decay  for 
the  remaining  22. 
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Figure  9.  Percent  of  variability  explained  by  each  principal  component  (Regime  4). 

2 

Critical  to  the  use  of  PCA  for  health  monitoring  are  the  calculation  of  the  monitoring  statistics  T 

2 

and  SPE,  which  are  defined  here  with  a  full  description  available  in  reference  7.  The  T 
calculation  is  similar  to  Mahalanobis  distance,  but  is  performed  with  the  principal  components, 
in  our  case,  retained  principal  components,  instead  of  the  original  data  matrix  (equation  5). 

r2  =  {«  LhbML,  (5) 

where  r  is  the  number  of  retained  principal  components,  \u\xr  are  the  retained  principal 
components,  and  'Lrxr  are  the  retained  eigenvalues 

The  residuals,  E,  are  calculated,  which  are  essentially  the  difference  between  the  model  and 
actual  data  values  (equation  6): 

{4,„=kL-M JfiL,  (0 

where,  n  are  the  number  signals,  {x}lm  are  the  actual  signal  values,  and  [P\'nxr  are  the  retained 
eigenvectors. 

SPE  is  the  sum  of  the  residuals  (summed  from  residuals  for  each  sensor)  (equation  7): 

SPE  =  fjEi  (7) 
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To  make  health  assessments  based  on  T“  and  SPE  values,  thresholds  were  calculated  using 
commonly  accepted  techniques  described  below  (8,  9).  The  thresholds  for  T"  is  calculated  using 
equation  8,  where  r  is  the  number  of  principal  components  retained,  in  is  the  number  of  samples 
in  the  training  data  set,  a  is  the  confidence  level,  and  F  is  the  F  value  from  the  F-distribution 
table: 


2_r(m-\) 

1a  1  r, 

m  —  r 


(8) 


The  SPE  thresholds  calculation  is  provided  in  equation  9.  It  is  quite  involved  because  the 
distribution  of  SPE  is  a  summation  of  Chi-square  distributions.  See  reference  9  for  more  details. 
Ca  is  the  Z-value  corresponding  to  a  given  confidence  level  (Nonnal  Distribution  Table)  and  1 
are  the  eigenvalues  calculated  from  the  training  data  set: 


SPE 


a 


+  1  + 


02ho(ho-l) 


0; 


(9) 


Where :  6 ;  =  ^  A'J  ,  i  =  1 ,2,3 

j=r+ 1 


and  :  h0  =  1 


2  9{6i 

~w[ 


PCA  Results 

Based  on  calculations  for  a  99%  confidence  level,  the  detection  results  are  very  good  for  most 
cases;  the  detection  rates  are  shown  in  table  6.  The  results,  in  general,  improve  with  engine 
speed.  The  false  alann  rate  is  zero  in  all  regimes  except  Regime  1 .  Even  so,  the  two  false  alarms 
in  Regime  1  had  values  that  were  only  slightly  above  the  detection  threshold.  Plots  of  SPE  and 
T"  for  each  regime  along  with  the  thresholds  are  presented  in  figures  10-17.  Note  that  the  x-axes 
are  the  original  test  numbers  and  that  the  y-axes  are  on  a  logarithmic  scale  (this  is  because  of  the 
large  range  in  values).  As  a  significant  note,  the  majority  of  the  missed  detections  were  for  air 
intake  restrictions.  For  example,  four  of  the  five  missed  detections  in  Regime  4  occurred  for  the 
air  intake  seeded  fault;  this  unfortunately  significantly  reduces  the  overall  detection  rate.  Also  of 
value,  the  monitoring  statistics  show  a  correlation  with  fault  severity,  in  that  faults  of  increasing 
severity  had  higher  (more  degraded)  health  values. 
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Table  3.  Results  of  SPE  and  T2. 


Regime 

Fault  Detection  Rate 
(SPE) 

False  Alarm 
Rate  (SPE) 

Fault  Detection 
Rate  (T2) 

False  Alarm 
Rate  (T2) 

1 

80% 

18.18% 

73.33% 

0.00% 

2 

80% 

0.00% 

70.00% 

0.00% 

3 

83.33% 

0.00% 

73.33% 

0.00% 

4 

83.33% 

0.00% 

83.33% 

0.00% 
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Figure  10.  Regime  1  SPE  health  values  for  each  run. 
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Figure  11.  Regime  1  T"  health  values  for  each  run. 


Figure  12.  Regime  2  SPE  health  values  for  each  run. 
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Figure  13.  Regime  2  T  health  values  for  each  run 
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Figure  14.  Regime  3  SPE  health  values  for  each  run. 
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Tsquare  Health  Value  Regime  3 
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Figure  15.  Regime  3  health  values  for  each  run. 


Figure  16.  Regime  4  SPE  health  values  for  each  run. 
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Figure  17.  Regime  4  T2  health  values  for  each  run. 

2 

It  is  of  interest  to  know  which  signals  had  the  greatest  contribution  for  a  particular  fault.  The  T“ 
and  SPE  contribution  values  for  each  of  the  signals  are  calculated  using  equations  7  and  8  (P). 
The  results  are  presented  in  tables  7  and  8.  An  additional  benefit  from  this  study  is  that  the 
contribution  plot  results  could  provide  insight  on  what  signals  are  important  for  single  parameter 
monitoring. 


contribue\  k  / 1  xr  [X]  rxr  kcolumn  llxl  ]lxr  ) 


(7) 


\SPE contribute 


n 

'LK 


(8) 


where  k  varies  from  1  to  n  for  both  equations  6  and  7. 

In  general,  the  top  contributors  for  the  SPE  agree  with  a  physical  understanding  of  the  engine 
and  the  faults  that  were  seeded.  For  example,  seeded  faults  with  boost  are  showing  that  the  boost 
sensor  is  the  main  contributor,  and  likewise  seeded  faults  with  exhaust  restriction  are  showing 
that  the  exhaust  pressure  sensor  is  showing  the  most  contribution.  The  contribution  results  from 
the  T“  are  more  difficult  to  interpret  in  some  instances,  for  example,  the  boost  sensor  is  the  top 
contributor  for  the  higher  exhaust  restriction  faults.  The  faults  that  were  induced  by  adjusting  a 
sensor  gain  gave  virtually  the  same  contribution  results  for  both  T“  and  SPE;  however,  the 
mechanical  faults  based  on  restricting  airflow  resulted  in  different  top  contributors.  Whether  this 
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is  always  the  case  would  require  further  experimentation  and  study.  Figure  18  shows  a  particular 
run,  50%  exhaust  restriction,  with  the  relative  contributions  of  each  sensor  from  SPE.  In  this 
case,  the  exhaust  pressure  sensor  has  a  much  higher  contribution  than  the  others,  with  the  second 
largest  contributor  being  the  exhaust  gas  temperature. 

Table  4.  Signal  contribution  to  fault  detection  from  T2  calculation. 


Test# 

MatLAB  File  Name 

Fault  Type 

Severity 

T  Contribution  1 

T  Contribution  2 

9 

PerfM3JntRestr May27 ext 

IntakeAir  Restric  Test 

Pos  #  4 

'P-ExhB4Turbo2' 

’P-ExhB4Turbo1' 

10 

PerfM3  IntRestr  May27  ext 

IntakeAir  Restric  Test 

Pos  #6 

'ECM1 -Boost' 

'Sensor-Boost' 

12 

PerfM3  OilP  Jun8 par 

OilPress  High  Gain 

Gain  0.7 

'ECMI-OilPres' 

'EngOilP' 

13 

PerfM3  OilP  Jun8 par 

OilPress  High  Gain 

Gain  1.3 

'ECMI-OilPres' 

'EngOilP' 

14 

PerfM3  AirChgT  JunlO  ext 

AirCharge  Temp  high  Shift 

Increased  by  20°F 

'Sensor-AirlntMani' 

'ECMI-AirlntMani' 

15 

PerfM3  AirChqT  JunlO  ext 

AirCharqe  Temp  hiqh  Shift 

Increased  by  30°F 

'IntManiAirT 

'Sensor-AirlntMani' 

16 

PerfM3  AirChgT  JunlO  ext 

AirCharge  Temp  high  Shift 

Increased  by  50°F 

'IntManiAirT 

'Sensor-AirlntMani' 

17 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #2 

'Sensor-InjPres' 

'ECMI-InjPres' 

18 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #3 

'InjCtrlP' 

'Sensor-InjPres' 

19 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #  4 

'P-ExhB4Turbo2' 

'P-ExhB4Turbo1' 

20 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

Pos  #5 

'P-ExhB4Turbo2' 

'P-ExhB4Turbo1' 

21 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

Pos  #6 

'ECM1 -Boost' 

'Sensor-Boost' 

22 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

'IntManiAirT 

'Sensor-AirlntMani' 

23 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

T-IntAirMani' 

'Sensor-AirlntMani' 

24 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

'Sensor-AirlntMani' 

'ECMI-AirlntMani' 

25 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

'ECMI-AirlntMani' 

'IntManiAirT 

26 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

'P-ExhB4Turbo2' 

'InjCtrlP' 

27 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #5 

'P-ExhB4Turbo2' 

'ECMI-EngCoolT 

28 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #6 

'P-ExhB4Turbo2' 

'T-ExhB4Turbo2' 

29 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #7 

'ECM1 -Boost' 

'Sensor-Boost' 

30 

PerforM3  B  BoostG  Jul6  ext 

Boost 

Gain  0.85 

'ECM1 -Boost' 

'Boost' 

31 

PerforM3  B  BoostG  Jul6  ext 

Boost 

Gain  0.95 

'Sensor-Boost' 

'P-ExhB4Turbo2' 

33 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

60% 

'P-ExhB4Turbo2' 

'P-ExhStack' 

34 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

55% 

'P-ExhB4Turbo2' 

'P-ExhStack' 

35 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

50% 

'ECM1 -Boost' 

'Sensor-Boost' 

36 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

42% 

'Sensor-Boost' 

'ECM1 -Boost' 

37 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

46% 

'Sensor-Boost' 

'ECM1 -Boost' 

38 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

50% 

'ECM1 -Boost' 

'Sensor-Boost' 

40 

PerforM3  InjPresG  ext3  ext 

Inj  Press 

Gain  0.9 

'Sensor-InjPres' 

T-ExhB4Turbo2' 

41 

PerforM3  InjPresG  ext3  ext 

InjPress 

Gain  1.1 

'Sensor-InjPres' 

T-ExhB4Turbo2' 
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Table  5.  Signal  contribution  to  fault  detection  from  SPE  calculation. 


Test# 

MatLAB  File  Name 

Fault  Type 

Severity 

Q  Contribution  1 

Q  Contribution  2 

9 

PerfM3 lntRestr May27 ext 

IntakeAir  Restric  Test 

Pos  #  4 

'P-ExhB4Turbo2' 

'P-AirB4Mani' 

10 

PerfM3  IntRestr  May27  ext 

IntakeAir  Restric  Test 

Pos  #  6 

AirFlow' 

'T-ExhB4Turbo2' 

12 

PerfM3  OilP  Jun8  par 

OilPress  High  Gain 

Gain  0.7 

'ECMI-OilPres' 

'EngOilP' 

13 

PerfM3  OilP  Jun8  par 

OilPress  High  Gain 

Gain  1.3 

'ECMI-OilPres' 

'EngOilP' 

14 

PerfM3  AirChgT  JunlO  ext 

AirCharge  Temp  high  Shift 

Increased  by  20°F 

T-IntAirMani' 

'IntManiAirT 

15 

PerfM3  AirChgT  JunlO  ext 

AirCharge  Temp  high  Shift 

Increased  by  30°F 

T-IntAirMani' 

'IntManiAirT 

16 

PerfM3  AirChgT  JunlO  ext 

AirCharge  Temp  high  Shift 

Increased  by  50°F 

T-IntAirMani' 

'IntManiAirT 

17 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #2 

T-ExhB4Turbo2' 

'P-ExhStack' 

18 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #  3 

Torque' 

'P-ExhB4Turbo1' 

19 

Perfor3  AirRestr  Jun15  ext 

AirRestriction  Low 

Pos  #  4 

'P-ExhB4Turbo2' 

'AirFlow' 

20 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

Pos  #5 

'P-ExhB4Turbo2' 

'AirFlow' 

21 

Perfor3  B  AirRestr  Jun15  ext 

AirRestriction  High 

Pos  #6 

'AirFlow' 

'P-ExhB4Turbo2' 

22 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

T-IntAirMani' 

'T-ExhB4Turbo2' 

23 

Perfor3  C  AirChgT  high  Jun15  ext 

AirChgHigh 

'Sensor-AirlntMani' 

'ECMI-AirlntMani' 

24 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

T-IntAirMani' 

'ECMI-EngCoolT 

25 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

'Sensor-AirlntMani' 

'ECMI-EngCoolT 

26 

PerforM3  AirChg  low  Jun16  ext 

AirCharge 

'P-ExhB4Turbo2' 

'P-aftTurbo' 

27 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #5 

'P-ExhB4Turbo2' 

'AirFlow' 

28 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #6 

'P-ExhB4Turbo2' 

'AirFlow' 

29 

PerfM3  B  AirlntRes  Jun29  ext 

IntRestriction 

Pos  #7 

'AirFlow' 

'P-ExhB4Turbo2' 

30 

PerforM3  B  BoostG  Jul6  ext 

Boost 

Gain  0.85 

'ECM1 -Boost' 

'Sensor-Boost' 

31 

PerforM3  B  BoostG  Jul6  ext 

Boost 

Gain  0.95 

'ECM1 -Boost' 

'Sensor-Boost' 

33 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

60% 

'P-ExhStack' 

'P-ExhB4Turbo2' 

34 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

55% 

'P-ExhStack' 

'P-ExhB4Turbo2' 

35 

PerforM3  ExhRestr  Jull 3  ext 

ExhRestr 

50% 

'P-ExhStack' 

'T-ExhStack' 

36 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

42% 

'P-ExhStack' 

'T-ExhStack' 

37 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

46% 

'P-ExhStack' 

T-ExhStack' 

38 

PerforM3  B  ExhRestr  Jull 3  ext 

ExhRestr 

50% 

'P-ExhStack' 

'T-ExhStack' 

40 

PerforM3  InjPresG  ext3  ext 

Inj  Press 

Gain  0.9 

'Sensor-InjPres' 

'ECMI-InjPres' 

41 

PerforM3  InjPresG  ext3  ext 

Inj  Press 

Gain  1.1 

'Sensor-InjPres' 

'ECMI-InjPres' 

Figure  18.  SPE  contribution  plot,  showing  the  relative  contribution  of  each  signal  (50%  exhaust  restriction). 
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5.  Discussion 


Several  items  of  interest  were  discovered  in  this  preliminary  study  including  relative 
perfonnance  of  the  methods  evaluated  as  well  as  salient  characteristics  of  the  results.  First,  when 
faults  were  applied,  differences  in  the  sensor  outputs  were  detected.  Therefore,  it  is  reasonable  to 
assume  that  single  parameter  modeling  can  be  applied  to  health  assessment  of  the  engine.  Again, 
the  caveat  of  requiring  expert  knowledge  to  set  signal  thresholds  inhibits  our  use  of  this  method 
at  this  time.  Although  such  infonnation  is  difficult  to  come  by,  if  it  were  to  become  available, 
then  this  method  would  be  simple  to  implement.  Second,  it  is  seen  that  PCA  is  better  suited  to 
detect  faults  in  this  data  than  correlation  analysis.  The  primary  drawback  of  correlation  analysis 
here  is  its  inability  to  detect  lower-level  faults.  There  are  several  items  of  interest  with  PCA  on 
this  data  set.  It  is  a  curiosity  that  the  results  improve  with  engine  speed  and  we  speculate  that  the 
effects  of  the  fault  are  exacerbated  as  the  speed  increases.  It  is  a  matter  for  further  study  why  the 
faults  that  were  induced  by  adjusting  a  sensor  gain  gave  the  same  contribution  results  for  both  T” 
and  SPE,  while  the  mechanical  induced  faults  gave  different  top  contributors.  Finally,  there  is  the 
matter  that  both  methods  could  not  detect  the  faults  in  all  but  the  highest  states  of  intake  air 
restriction.  At  this  time,  we  can  only  suggest  that  the  lower  states  do  not  appear  to  have  a 
significant  effect  on  engine  perfonnance.  This  emphasizes  a  point  regarding  the  nature  of  this 
testing;  although  named  seeded  faults,  the  runs  may  be  more  accurately  described  as 
perturbations  in  operating  variables  and  are  not  faults  in  the  traditional  sense.  These 
perturbations  may  or  may  not  adversely  affect  engine  performance.  With  this  in  mind,  our  work 
is  on  the  detectability  of  the  perturbations;  and  whether  or  not  they  are  critical  to  the  actual 
“health”  of  the  engine  is  uncertain. 


6.  Conclusion  and  Recommendations 


2 

Single  parameter  monitoring,  correlation  analysis,  and  PCA  with  two  independent  statistics — T“ 
and  SPE — all  show  applicability  to  this  problem.  As  discussed,  single  parameter  monitoring  can 
be  pursued  further  if  thresholds  in  signals  become  available.  Encouragingly,  both  PCA  statistics 
and  correlation  analysis  detect  the  majority  of  the  faults.  PCA  by  far  outperformed  correlation 
analysis,  and  between  the  two  any  further  work  should  focus  on  PCA.  For  the  PCA  method, 
various  model  refinements  can  be  done,  such  as  adjusting  how  many  principal  components  to 
retain,  using  a  smaller  sensor  subset,  or  incorporating  the  analog  and  Penn  State  data  in  the 
analysis.  Finally,  it  is  recommended  to  evaluate  the  data  using  nonlinear  PCA;  AANN  is  a 
proven  way  of  implementing  this  approach  (11).  The  motivation  for  using  the  AANN  approach 
is  based  on  the  belief  that  some  of  the  sensors  in  the  engine  have  a  nonlinear 
relationship/correlation. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


AANN 

ARL 

CAN 

DAQ 

dyno 

FOM 

MIS 

P&D 

PCA 

Penn  State 

ROC 

SPE 

TARDEC 


auto-associative  neural  network  based  methods 

U.S.  Anny  Research  Laboratory 

controller-area  network 

data  acquisition  system 

dynamometer 

figure  of  merit 

Millennium  Integrated  Services 
prognostics  and  diagnostics 
principal  component  analysis 
Pennsylvania  State  University 
receiver  operating  characteristic  curve 
square  prediction  error 

U.S.  Anny  Tank  and  Automotive  Research,  Development  and  Engineering  Center 
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